Factor analysis based on SHapley Additive exPlanations for sepsis-associated encephalopathy in ICU mortality prediction using XGBoost — a retrospective study based on two large database

Objective Sepsis-associated encephalopathy (SAE) is strongly linked to a high mortality risk, and frequently occurs in conjunction with the acute and late phases of sepsis. The objective of this study was to construct and verify a predictive model for mortality in ICU-dwelling patients with SAE. Methods The study selected 7,576 patients with SAE from the MIMIC-IV database according to the inclusion criteria and randomly divided them into training (n = 5,303, 70%) and internal validation (n = 2,273, 30%) sets. According to the same criteria, 1,573 patients from the eICU-CRD database were included as an external test set. Independent risk factors for ICU mortality were identified using Extreme Gradient Boosting (XGBoost) software, and prediction models were constructed and verified using the validation set. The receiver operating characteristic (ROC) and the area under the ROC curve (AUC) were used to evaluate the discrimination ability of the model. The SHapley Additive exPlanations (SHAP) approach was applied to determine the Shapley values for specific patients, account for the effects of factors attributed to the model, and examine how specific traits affect the output of the model. Results The survival rate of patients with SAE in the MIMIC-IV database was 88.6% and that of 1,573 patients in the eICU-CRD database was 89.1%. The ROC of the XGBoost model indicated good discrimination. The AUCs for the training, test, and validation sets were 0.908, 0.898, and 0.778, respectively. The impact of each parameter on the XGBoost model was depicted using a SHAP plot, covering both positive (acute physiology score III, vasopressin, age, red blood cell distribution width, partial thromboplastin time, and norepinephrine) and negative (Glasgow Coma Scale) ones. Conclusion A prediction model developed using XGBoost can accurately predict the ICU mortality of patients with SAE. The SHAP approach can enhance the interpretability of the machine-learning model and support clinical decision-making.


Introduction
Sepsis, a syndrome caused by dysfunction of organs including the central nervous system (CNS), heart, and lungs (1-3) due to dysregulation of the host response to infection, is the most common cause of death in intensive care unit patients worldwide.One manifestation of sepsis-induced cerebral dysfunction is sepsisassociated encephalopathy (SAE), which is defined as diffuse cerebral dysfunction secondary to organic infection in the absence of an obvious central nervous system infection (4).
The pathophysiology of SAE is intricate, arising from a convergence of inflammatory and non-inflammatory processes impacting various categories of cerebral cells.Significant mechanisms encompass heightened microglial activation, disruption of the bloodbrain barrier (BBB), and the perpetuation of an extended inflammatory reaction (5).Upon the initial emergence of sepsis, an inordinate immune-inflammatory response is incited, setting in motion the infiltration of inflammatory mediators into cerebral tissue, thereby activating microglial cells.This activation gives rise to the establishment of a cytotoxic milieu, instigating the release of reactive oxygen species, nitric oxide (6), and glutamate, as a countermeasure against sepsis.Nevertheless, the CNS is notably vulnerable to neurotoxic agents such as free radicals, inflammatory mediators, and intravascular proteins, thus precipitating a malfunction in the BBB (7).The relentless activation of microglia perpetuates a deleterious cycle, culminating in aberrant neuronal performance and cellular demise, thereby exacerbating BBB impairment and the progression of SAE.In addition to this, sepsis damages the hippocampus, cortex, cerebellum and brainstem of the brain.Sepsis-driven brain damage occurs in a diffuse form and is strongly associated with cognitive impairment.
Clinicians must exclude primary CNS disorders, sedation-related cognitive disorders, metabolic encephalopathies, and poisonings before diagnosing SAE on the basis of cognitive and neuropsychiatric deficits, manifestations of delirium, or a Glasgow Coma Scale (GCS) score of less than 15 (8).Globally, up to 50% of intensive care unit (ICU) patients present with SAE during sepsis (4,9), which tends to increase the length of stay and mortality of septic patients in the ICU (10).The current lack of specific treatment options and insufficient understanding of the underlying mechanisms of SAE are the most common causes of poor prognosis in sepsis.Therefore, the aim of this study was to investigate the independent risk factors for ICU death in patients with SAE and to develop a predictive model to quantify the likelihood of ICU death in patients with SAE.

Data source
Data for this study were obtained from the MIMIC-IV and eICU-CRD databases, with the former being a multiparametric, structured single-center critical-care database published in 2003 that includes clinically available data on more than 380,000 patients during 2008-2019.There was no requirement to obtain permission from individual patients or ethical approval statements because the initiative had no impact on clinical care and none of the patients in the database could be identified (11).Our study also followed the guidelines of the Declaration of Helsinki and Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (12).
The eICU-CRD database contains data from the ICU wards of numerous hospitals in the US.It contains routine data on 200,859 patients obtained from more than 300 hospitals in the US during 2014 and 2015 (13).No specific patient permission was needed because both databases use anonymous health data.

Patient population
Presently, there exists a deficiency of precise diagnostic modalities for SAE.Clinical diagnosis relies on exclusion and necessitates discrimination from central nervous system infections, metabolic encephalopathy (a widespread yet potentially reversible cerebral dysfunction arising from metabolic or toxic origins), excessive sedative ingestion, and withdrawal manifestations with the potential to impact sensory faculties.
Patients with a Sequential Organ Failure Assessment (SOFA) score ≥ 2 based on the Sepsis-3 classification and a GCS score < 15 or delirium on the day before admission to the ICU were considered SAE patients.The exclusion criteria were (1) presence of primary brain injury, (2) psychiatric disorders and neurological diseases, (3) metabolic, hepatic, hypertensive, or toxic encephalopathy, (4) severe electrolyte disturbance or deglycation, (5) patients who were intubated, given analgesics, and sedated at the time of admission, (6) long-term alcohol or drug abuse, or (7) an ICU stay of <24 h. Figure 1 depicts the flow chart for case inclusion.

Observation indicators
This study used Structured Query Language to extract the following basic information of patients from both databases: age, gender, and mean values of vital signs at the time of first ICU admission, including heart rate, respiratory rate, and body temperature.From the time of ICU admission, the first laboratory data included ghrelin, lymphocytes, eosinophils, neutrophils, monocytes, hemoglobin, urea nitrogen, platelets, creatinine, glucose, blood urea nitrogen (BUN), hematocrit (HCT), partial thromboplastin time (PTT), white blood cell count (WBC), normalized ratio (INR), anion gap (AG), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), Mean Corpuscular Volume (MCV) and red blood cell distribution width (RDW), Severity

Statistical analysis
Statistical processing was performed using R software (version 4.2.2).For continuous variables, values are expressed as standard deviation or median of interquartile range (IQR); for categorical variables, values are expressed as total (%).Comparisons of continuous variables were made using the t test or Wilcoxon rank sum test, and comparisons of proportions were made using the χ 2 test or Fisher exact test.After the variables were identified by Extreme Gradient Boosting (XGBoost), we used these included clinical and laboratory variables to construct a prediction model for in-ICU mortality in SAE patients based on the XGBoost algorithm.XGBoost is an improved algorithm based on gradient boosting decision trees that efficiently constructs boosted trees and runs them in parallel (14).The core of the algorithm is to optimize the value of the objective function (15).In model development and comparison, we use a 5-fold crossvalidation approach, which provides a more stable and reliable way to measure the performance of the model.
The prediction model was trained and internally validated using training and test sets randomized at a ratio of 7:3.The performance of the prediction model was externally validated using the identical data of patients with SAE from the eICU-CRD database.The predictive values of different models were analyzed using the receiver operating characteristic (ROC) and area under the ROC curve (AUC), with the latter allowing quantitative differentiation of column line graphs.In the XGBoost analysis, qualitative data were converted to numerical data, and "yes" and "no" were converted to "1" and "0, " respectively.
An aesthetically pleasing additional interpretation method, the SHapley Additive exPlanations (SHAP), was used in XGBoost to increase the readability of the model.SHAP is a technique used to explain the output of any machine-learning model (16).A SHAP summary plot was used to present the effect of the characteristics attributed to the model.Colors in the scatter plot intuitively represent the correlation between the characteristic value and the anticipated probability.The importance of specific features and their impacts on the output of the model were examined using the SHAP dependence plot.A SHAP force plot was used to illustrate how important characteristics affect use of the final model across all patients.

Baseline patient characteristics
This study used 1,573 patients from the eICU-CRD database as the external test set, and 7,576 patients from the MIMIC-IV database were randomly split into a training (n = 5,303, 70%) and an internal test (n = 2,273, 30%) set.Flowchart of patient cohorts.
The demographic profiles and fundamental patient information pertaining to the training and test sets are delineated in Tables 1, 2. In Table 3, we present the foundational attributes of the study cohort sourced from the MIMIC-IV database, stratified by distinct outcomes.Notably, the average age of patients afflicted by SAE was notably higher in the deceased group in comparison to the survivor group.Furthermore, the incidence rates of myocardial infarction, peripheral vascular disease, dementia, diabetes, sclerosis, and liver disease exhibited variations between these two cohorts.Conversely, no statistically significant disparities were observed between the two groups concerning myocardial infarction, peripheral vascular disease, dementia, diabetes, sodium levels, and MCH.Turning our attention to the baseline attributes of the subjects derived from the eICU-CRD database for different outcomes, these details are summarized in Table 4. Significant differences among groups were discerned in variables such as sedative usage, analgesic administration, vasopressin and norepinephrine dosages, GCS score, SOFA score, lactate levels, creatinine values, bicarbonate levels, BUN, PTT, INR, AG, MCHC, RDW, respiratory rate, and body temperature.

Feature selection
The XGBoost algorithm identified APSIII, vasopressin, GCS score, PTT, norepinephrine, age, RDW, and length of ICU stay as independent predictors of SAE. Figure 2A presents the importance of each factor influencing SAE.APSIII had the highest score, indicating that determining severity in patients was the most relevant and important factor.Smaller APSIII values indicate a lower output from the model.The GCS score had the smallest effect on the model.Figure 2B presents the SHAP summary plot, which reflects the influence of each factor using the SHAP value in XGBoost and whether they had a positive or negative effect The SHAP plot illustrates the influence of each parameter on the XGBoost model, including the positive (APSIII, vasopressin, age, RDW, PTT, and norepinephrine) and negative (GCS) effects.
Each of the eight factors is represented by a SHAP dependence plot in Figure 3, which illustrates how different characteristics influenced the XGBoost model results.Positive SHAP values for specific factors represent an elevated mortality risk.We found that mortality was correlated with higher APSIII, age, RDW, and PTT, and a lower GCS score.Both longer and shorter ICU stays were associated with lower survival rates.Patients who receive vasopressin and norepinephrine may experience higher mortality rates.
When GCS scores were low, the SHAP interaction values of APSIII with the GCS score decreased as APSIII increased (Figure 4A).The interaction effect of APSIII with norepinephrine and vasopressin (Figures 4B,C) did not seem to be affected by differences in norepinephrine or vasopressin use.Samples with the highest SHAP values for death in the ICU were often accompanied by vasopressin use and a shorter ICU stay.When the GCS score was higher, the value of the interaction between time and GCS score decreased as the length of ICU stay increased (Figure 4D).The interaction effect of time in ICU with norepinephrine (Figure 4E) did not appear to be affected by the use of norepinephrine.The value of the interaction between time and vasopressin use decreased as the length of ICU stay increased (Figure 4F).When the GCS score was high, the interaction value between age and GCS score decreased as age increased (Figure 4G), and the interaction value between age and GCS score decreased to a negative value at 73 years old.When norepinephrine was used, the value of the interaction between age and norepinephrine increased with age (Figure 4H) and became positive at 73 years old.The SHAP interaction values for age with vasopressin use also increased with age (Figure 4I).
The ultimate output was obtained as the sum of the attributions from each predictor, as seen in the SHAP force plot (Figure 5), which displays these SHAP values stacked for each observation.

Discrimination ability
The ROC was used to evaluate the discrimination ability of the model.The XGBoost model test, internal validation, and external validation sets had AUC values of 0.908, 0.898, and 0.778, respectively (Figure 6).

Discussion
SAE represents a multifaceted encephalopathy, signifying a widespread cerebral impairment stemming from sepsis.It manifests with manifestations such as delirium, coma, cognitive deficits encompassing the loss of learning and memory, and the occurrence of seizures.The pathophysiology of SAE remains partially elucidated.While advances in sepsis research and treatment have lately yielded enhanced prognostic outcomes, the mortality rate of SAE remains disheartening.The identification of risk factors is imperative in grasping the prognosis of SAE.
An XGBoost model was constructed and validated in this study to predict ICU mortality in patients with SAE.The importance analysis of the factors in the XGBoost model suggested that APSIII, vasopressin, age, length of ICU stay, RDW, norepinephrine, PTT, and GCS score are strong predictors of SAE.In this study, the XGBoost prediction model was obtained from the AUC results and had good predictive power.SHAP also offered credible visual interpretation of the predictions, encompassing both positive and negative impacts.In this study we not only calculated values of general parameters for predicting the probability of death in the ICU, but also presented a visual  In the current investigation, APSIII emerged as the most weighty contributor in the importance plot, underscoring its robust capacity to predict mortality in the ICU for individuals grappling with SAE.As   an integral facet of the APACHE system, APSIII aptly showcases its aptitude in prognosticating the mortality rates of patients grappling with severe sepsis and septic shock (19).In a large number of studies it has been found that survivors have significantly lower APACHE III scores than deceased patients and that higher scores (OR 1.11,95% CI 1.05-1.18,p = 0.001) are associated with increased in-hospital all-cause mortality in patients with severe sepsis (20,21).Inflammatory response, immunosuppression, and multiple organ dysfunction syndrome may be responsible for the high scores in patients with SAE (1, 22).A predominant clinical characteristic of SAE is the alteration in the level of consciousness.In milder cases, there is a reduction in attention and alertness, accompanied by symptoms like anxiety and delirium.In severe instances, it may lead to stupor or coma.Longterm cognitive impairments encompass deficits in memory, attention, verbal fluency, and executive functions, significantly impacting the quality of life for survivors.In a study concentrated on discerning initial and potentially amendable factors of SAE upon admission to the ICU, it was determined that even slight alterations in cognitive function, as defined by a GCS score of 13-14, were autonomously correlated with mortality at the point of ICU admission (10,23).Furthermore, our findings confirm the independent role of the GCS score as a risk factor for ICU mortality in SAE patients.This reinforces the utility of the GCS score and APSIII in gauging the severity and prognosis of individuals afflicted with SAE.Norepinephrine and vasopressin are now commonly used in clinics as vasoactive drugs.In the Surviving Sepsis Campaign guidelines, norepinephrine is recommended as the vasopressor for sepsis treatment (24), and often vasopressin is used as an adjunct to sepsis.Maheshwari et al. found that the significant blood pressure response to VAS was substantially linked to reduced survival probability in patients with septic shock (25).However, there are only treatment guidelines for sepsis and septic shock (15,17), with a lack of specific treatment guidelines for SAE (24,26).To realistically assess ICU mortality in SAE, clinicians should be aware of other treatment options for SAE in order to improve the corresponding survival assessment system.Currently, no specific therapeutic interventions are tailored for SAE.Treatment protocols are established on the comprehensive management of sepsis, with a predominant focus on symptoms associated with cerebral maladies, while endeavoring to minimize detriment to the central nervous system.Early-stage resuscitation is acknowledged as a pivotal therapeutic strategy for sepsis, and the administration of vasoactive agents correlated with normal arterial pressure subsequent to initial fluid therapy can mitigate the severity of sepsis (27).Furthermore, glucocorticoids, alternative markers, and modulators of the neuroimmune axis have been under consideration for addressing sepsis-induced cognitive impairments (28,29).Indoleamine 2,3-dioxygenase, impacting the inflammatory cascade, is identified as a potential therapeutic target for central nervous system disorders, fostering cognitive enhancement in sepsis patients (30).
Most of the patients with SAE in this study stayed in the ICU for less than 1 week, and the length of ICU stay had an overall negative effect on the outcome.Related studies have found length of ICU stay to be related to disease severity (31), which has important implications for the wise use of scarce medical resources (32).Elderly patients with SAE admitted to the ICU mostly died earlier than did younger patients with SAE, which is supported by the findings of Martin et al. (33).The introduction of comorbidities harm immune function as age progresses, which causes patients with critical illness to deteriorate more rapidly.Geriatric patients may be more vulnerable to CNS issues, particularly if hypertension, diabetes mellitus, or acute renal injury is the underlying illness (10,34,35).Older hospitalized people need more-specialized nursing or rehabilitation care.These findings offer guidance on how to allocate healthcare resources for patients with SAE and offer suggestions for future research projects and patient interventions.
RDW may be therapeutically valuable for predicting the future course and prognosis of various disorders, including stroke, atrial fibrillation (36), COPD (37), community-acquired pneumonia, and sepsis (38).Currently, several studies have indicated that RDW  40).This finding was also demonstrated in another study, which found that the addition of RDW to the ICU scoring system improved its mortality predictions (41,42).The present study indicated there was a higher risk of dying from SAE in the ICU when RDW levels were high.Elevated RDW reflects a severe dysregulation of red blood cell homeostasis, which may be an important prognostic factor for SAE.The mechanistic relationship between RDW and the ICU mortality rate in SAE remains obscure.However, research suggests that oxidative stress may contribute to the detrimental impact of RDW on the prognosis of SAE, as oxidative stress levels exhibit a positive correlation with RDW (43).Apart from this, the inflammatory response in septic patients shortens red blood cell lifespan, impairs red blood cell maturation, resulting in premature release, and thus elevating RDW (44,45).Furthermore, proinflammatory cytokines inhibit erythropoietin-induced red blood cell proliferation and maturation, also leading to an increase in RDW (46).This may represent another rationale for the association between RDW and ICU mortality.One strength of this study was the external validation of the SAE mortality risk model using the eICU-CRD database, which confirmed its efficacy.SHAP allows visualization of XGBoost models, and its sound visual interpretation greatly increases the confidence that clinicians have in the application of machine learning.However, there were some limitations to this study.First, only data from the US were utilized to construct and validate the model, which might reduce its applicability to other regions of the world.Furthermore, in retrospective studies, it is inevitable to relinquish certain variables with a substantial amount of missing values.Various unmeasured  SHAP interaction plot of the eight most essential features for SAE assessment.SHAP, SHapley Additive explanation; XGBoost, eXtreme Gradient Boosting; SAE, sepsis-associated encephalopathy.

Conclusion
This study validated the efficacy of machine-learning-based XGBoost for early outcome predictions for patients with SAE.The SHAP method improves the readability of XGBoost models and aids doctors in comprehending the logic behind findings obtained from such models.

FIGURE 1
FIGURE 1 explanation for specific patients using SHAP plots.The predictive value of a clinical factor for the XGBoost model increased with the average absolute SHAP value of each factor.Each factor was averaged to provide a homogeneous perspective, and the interpretation of SHAP was based on each individual patient(17).SHAP has two advantages: (1) it considers the effects of individual

FIGURE 2
FIGURE 2 Predictor variables selection.(A) Importance of the predictor variables selected by XGBoost.(B) The SHAP summary plot.

FIGURE 3 SHAP
FIGURE 3 SHAP dependency plot of the XGboost model.The SHAP dependence plot shows how a single feature affects the output of the XGBoost prediction model.SHAP values for specific features exceed zero, representing an increased risk of death.RDW, red blood cell distribution width; PTT, partial thromboplastin time.

FIGURE 5 SHAP
FIGURE 5 SHAP force plot of the XGboost model.(A) Influence plot of macroscopic features for all samples.(B) Influence plot of macroscopic features for a random portion of the samples.A positive Shap value represents a positive gain area and a negative Shap value represents a negative gain area.

TABLE 1
Research subject base information form (internal validation).
factors and the synergy between factors, which can solve the multicollinearity problem, and (2) SHAP determines whether the influence is favorable (18).

TABLE 2
Research subject base information form (external validation).

TABLE 3
Comparison of basic characteristics of the surviving and dead groups in the MIMIC-IV database.

TABLE 4
Comparison of basic characteristics of the surviving and dead groups in the Eicu-CRD database.